allenkaci_LATE_92524_8341950_Progress Report 4.3.2020.pdf
  • Name : Internship Organization : Mentor Name : Raj Shah Dates of 2-week period covered :
  • Task 2 : Created dashboards that were presented to upper management of informative and descriptive statistics .
  • A current system was in place , but it was not easy to categorize or distinguish between constituents .
  • The current system was slightly modified to include all needed data , as well as the categories each component was labeled as .
  • This ABF module was used to make daily , sometimes multiple-times a day updates to the tracker .
  • This tracker was presented multiple times to upper management to give a summary of the ABF business unit .
  • A database pulling from the system of record is being distributed and I am utilizing it constantly .
  • Task 2 outcome : a few spreadsheet dashboards have been created to report on an ongoing basis what is being asked of ABF .
  • A standardized tracking system including primary database keys that were previously not being used .
  • I have been adding them to all ad-hoc reports to help cross-check and maintain an accurate and efficient tracking system .
almasriosama_94996_8338669_Report5_Almasri.pdf
  • Name : Osama AlMasri Internship Organization : WestRock Company Mentor / Preceptor 's Name : Mitesh Patel Dates of 2-Week Period Covered : 3/23/2020-4/3/2020
  • Current Tasks : My project will be within the Human Resources ( HR ) group in WestRock company .
  • The fourth two weeks of the internship went slightly off track since that our team received some urgent analytics request from leaders in the organization .
  • Tasks 3 was to bridge gap between statistics and business if model requires that .
  • Task 4 ( added now ) is to work on the urgent requests received .
  • Task 4 progress : We received a request from the Finance organization to build a dashboard that shows different employee and employer paid benefits and taxes in the previous two fiscal years .
  • I then had to seek explanations from subject matter experts to identify which transactions were employer versus employee paid .
  • I used QlikView to visualize the different aspects of the request : by business unit , by payer entity and by union status .
  • We then received another request to build a visual analytics solution using Power BI .
  • I took an aggregation of the data from QlikView , entered it into Power BI and built a dashboard that was then sent to the leadership .
bapatanjali_126445_8316575_Progress Report 5.pdf
  • Name : Anjali Bapat Internship Organization : GoodRoads LLC Mentor s Name : Chris Sunde Dates of first report covered : Mar 23rd- Apr 1st
  • Task 2 : Creating Data pipeline to load data to Google cloud database and create route optimization .
  • Specific steps / progress Task 1 : Cleaning the data collected from openstreetmap.. We want to onboard few cities but we could not find relevant data on internet .
  • Even we talk to city authorities to give the data but sometime that does not work .
  • We work only with city owned street , so it is main task to filter those streets by looking back and forth with QGIS map view and doing SQL filtering .
  • Task 2 Creating Data pipeline to load data to Google cloud database and create route optimization .
  • - Created data pipeline and routes for almost all the cities that was in our list .
  • - Got a data subset of one type of maintenance for Matthews and will apply same to other
  • Pulling up data from openstreetmap and looking for desired street types .
  • Perform Linear regression to predict Ratings .
bartonronald_LATE_125819_8361870_Progress Report 5.pdf
  • We also worked on code testing for the ULM rewrite , it was mostly a QA exercise to help the primary coder and report findings/issues
  • As a refresher , to start the process I created a list of the columns right before the model runs and imported to csv .
  • This is mostly manual work , looking at the variable names and description from a data dictionary and assigning a category .
  • We ran python univariate scripts that generates an exposure over loss , premium , peril , etc .
  • After having these univariate charts , which are created in excel from the script , you can compare similar variables and see if there are problems or bad data .
  • Another example would be comparing two similar variables that are trying to show the same data , if one has a bunch of missing records the other is likely the better choice .
  • Throughout the process we discovered various errors and presented them to the developer to fix and update the github code .
  • We finished testing all the models by the Sprint Review and ironed out any issues in the new code !
  • helps to have a team to do some dedicated testing , like me and the other intern in this case , so the developer can work on deploying changes quickly .
  • Data science certainly requires some domain knowledge from within your industry as well , we had many meetings with business partners to get more information during our variable testing exercise .
cantyjeremiah_35429_8303916_Report5_Canty.pdf
  • Name : Jeremiah Canty Internship Organization : UNC Charlotte Mentor / Preceptor 's Name : Dr. Doug Hauge Dates of 2-Week Period Covered : 3/21/2020 – 4/3/2020
  • Task I : Learn more about certain techniques in Python to manipulate the visualizations to the type of display I want/need .
  • Task I progress : I first had to research through Matplotlib and trial and error to configure the visualization to the specific results I wanted and to analyze and to display the data in the correct format that I wanted
  • Have started drawings conclusions and interesting findings from the visualizations that will help us predict run times .
  • I created two new charts to help explore the data quality differences amongst years .
  • I also had to create visualizations that display the slope and intercept for each year and contrast them to draw conclusions .
  • Task I outcomes : Manipulated the data to create more detailed visualizations .
  • I have edited and created a chart that displays the regression of boy s athletes separated by the running season year and put them on equal scales .
  • Task III outcomes : I created insightful visualizations that display relationships .
  • Using excel created charts that display the relationship between the various years regression line of intercepts to slopes to the progression of the athletes .
cardenasarturo_27332_8320477_Report5_Cardenas.pdf
  • Name : Arturo Cardenas Internship Organization : APHI - Academy for Population Health Innovation Mentor / Preceptor 's Name : Dr. Michael Dulin Dates of 4-Week Period Covered : March 21st to April 3rd .
  • Changing Projects , Due to Mecklenburg County closing offices to prevent Covid-19 Virus .
  • After some conversations with Mr. Bynum , we concluded that it would be difficult to finish this project before end April .
  • I asked Dr. Dulin permission to start working on a backup project using open source datasets .
  • Use all available data to try to find new patterns in the data , I will use cluster Analysis , and also predict some dependent variables , possible to predict diabetes type I or Diabetes type II using logistic regression or random forest .
  • Task III progress : The data science part , I have not started on this task yet , but I will start next week , I am planning to predict diabetes I or Diabetes II participants using logistic regression and random forest , Also , I am planning for Cluster Analysis .
  • Task I outcomes : The data design and specs for the project , will do an entity relationship diagram .
  • Task II outcomes : Design and build an easy to use R shiny dashboard , as illustrated in next page .
  • Task III outcomes : Cluster Analysis on some of the variables , and logistic regression to predict medical conditions such as Diabetes I and II .
  • ( b ) better to stay at home , to avoid contact with the contagious disease ( c ) it ’ s always good to have a backup project , even from open source datasets .
copeblake_23876_8339517_Progress Report 5.pdf
  • Name : Blake Cope Internship Organization : Sports Business Journal Mentor s Name : Derick Moss Dates of 2-Week Period covered : 3/23/20 – 4/3/20
  • I am continuing my work with the Sports Consumer research data .
  • The survey focuses on how sport fans viewing habits for each of the major professional leagues ( MLB , NBA , NFL , NHL , and MLS ) .
  • For example I found that the most popular reason for interest in MLB was for fantasy/gambling , while the next two biggest reasons where fans who play video games , and interest in watching the highest level of play .
  • I repeated this process and presented my findings to a couple of our data analysts .
  • I found that five were the optimal number of features to use to model MLB interest but they accuracy was low so I may do some further data cleaning to improve .
  • Presenting my work to some of the fellow data analysts made me feel a little
  • Communicating results is definitely a skill I need to work on in order to accomplish my career goals so getting the opportunity to do so last week was great .
  • This was my first time using a feature selection method , so I was glad to gain
  • Continue to work towards creating a predictive model for this data .
demirelif_126087_8320437_Report5_Demir.pdf
  • Progress Report 5
  • Name : Elif Demir Internship Organization : Innovation Partners LLC Mentor Name : Ellen Jiao Dates of 2 Week Period Covered : 16 March – 2 April
  • Current Task
  • Due to Covid 19 , I wasn ’ t be able to make progress on my project for this time period .
  • My mentor did not allow me to work remotely .
  • Therefore , progress report 4 was my last report for the internship .
  • In this period , I made progress on my internship presentation and report .
  • Specific Steps | Progress
  • Takeaways | Lessons Learned
  • 2 Week Plan
duttaroma_117177_8339560_ProgressReport5_dutta.pdf
  • Internship Organization : Wells Fargo
  • Mentor / Preceptor 's Name : Subhabrata Mukherjee
  • As part of my task , I have started with initial model designing .
  • A research has also been performed in this period to check if we could leverage AWS services to build our model and subsequently deploy/host the same in cloud environment .
  • Specific steps / progress
  • Initial Model designing
  • Feasibility study for cloud deployment
  • No specific outcome to share for this week .
  • Better understanding of Regularization A study on Pre-trained language models Concept of Pre-trained sentence vectors
  • Model Building Testing Validation Convert model output to a tangible score Model Building Testing Validation Convert model output to a tangible score Create Visualizations Create necessary Documentation and upload the same in github/Bitbucket Final validation and Model tuning if required
gargdivya_LATE_116438_8371202_Progress Report 5.pdf
  • Name : Divya Garg Internship Organization : Open Data Nation Mentor / Preceptor 's Name : Carey Anne Nadeau Dates of 2-Week Period Covered : 20th Mar- 3rd Apr
  • These 2 weeks I worked on 1st part of State task 5 as this task is divided into 2 parts , i.e. matching road network to weather data and combining weather data to big crash data frame .
  • For 1st part of this task , I used Google BigQuery to extract weather data using SQL queries to match it with roads , after which it was matched on Google Datalab by creating matching algorithm code .
  • For State Task 5 , the goal was to extract weather data from google BigQuery and attach them
  • chosen based upon the criterion set by the team .
  • After extracting chosen station data , I wrote SQL queries to remove weather stations that had
  • I was able to extract and clean all the required weather stations that will be needed for further analysis and was able to plot the same data .
  • Worked with Google BigQuery to extract data .
  • Worked with Google Datalab to clean , analyze and visualize required data .
  • I will be working on part 2 of state task 5 , which is merging weather data with road network and crash files .
griderhansen_95247_8328777_Progress_Report_5_Grider.pdf
  • Task 1 : Complete my first project deliverable which will be a Jupyter notebook containing my exploratory data analysis ( EDA ) for FINRA complaint code classification .
  • Since classification is a relatively simple NLP task for which a bag of words ( BOW ) approach is often effective , I will try the most simplified techniques first .
  • In theory , if I ’ ve processed the data sufficiently and if word frequencies are indicative of class , these associations should make intuitive sense .
  • c. Look at the ratio of message length per document for each product/problem code This will give us an idea of how much data ( information ) we have to train a classification model on for each category that we want to classify .
  • I also studied the format of the communications and designed regex operations to successfully remove punctuation and newlines , replace numerals , and lowercase all tokens .
  • The overall task outcome is obviously a polished EDA note book that will result in the text being properly preprocessed and analyzed so that I can proceed to the next phase of the project which is actually building the classification model ( s ) .
  • Task 1 After receiving the data I noticed that the text of each customer message appeared to be in written to the database in an email-like format .
  • All these communications are collected by various representatives across the company and then passed to CCT as emails that they receive either through their Outlook inboxes or through a system called Unified Workflow ( UW ) .
  • In fact , these things add noise and may need to be systematically removed via Python regular expressions in the text preprocessing phase .
  • In looking at a sample of these subject lines , clients do tend to concisely ( if somewhat crudely ) state their problem here , as is good practice in any effective email communication .
gulleyalexander_117139_8330573_Report5_Gulley-1.pdf
  • Name : Alexander Gulley Internship Organization : Ally Bank Mentor s Name : Jiamin Lei March 23 April 3
  • - Continue to tune the model
  • o Added in TBATs and Holt Winters Models into the mix o Added cross validation to the process to find the best model out of sample .
  • forecast horizon of 3 and took quarter steps for a year out .
  • - Met with stakeholders / subject matter experts
  • o Discussed the model initial results with stakeholders .
  • o Started outlining the paper and presentations .
  • External Regressors does not work with R s forecast ( version 8.5 ) package that drops columns where everything is zero .
  • This is causing an issue during cross validation as some regressors are only present during the last year .
  • Microsoft Open R s MRAN has not been updated to the newest version of forecast that fixes this issue .
guptasmits_116219_8329708_Progress Report 5.pdf
  • Your Name Smitakshi Gupta ( Smits ) Internship Organization - Institute of Social Capital- UNCC
  • Mentor / Preceptor 's Name Justin Lane Dates of 3-Week Period Covered in the Progress Report – 03/28/2020 to 04/03/2020
  • In the last period ( 24th to 28th March ) I looked closer into the economy worksheet .
  • I wanted to add the Race column to see demographics requesting food stamps .
  • Task 1 I looked into the health worksheet for the Mecklenburg quality of life dataset .
  • The data can be analyzed to see whether the numbers have increased over the period of time .
  • Task I – I did not know anything about low cost health care system so I took time to research how it works .
  • Task 1 In some areas , the number of grocery stores have increased and in some , the grocery stores have just moved from one block to another .
  • There has been a little increase in the health care system but not much was done in that segment .
  • I am hoping to speak to my manager to clear some of my doubts and see how things go from there .
hakasmaggie_127066_8336414_ProgressReport5Hakas.pdf
  • Name : Maggie Hakas Internship Organization : The Hartford Mentor / Preceptor 's Name : Heather Grebe/Lane Coonrod Dates of 2-Week Period Covered : 3/20-4/3
  • This sprint went a lot smoother after finally getting settled into work from home .
  • Task I progress : This project continued mine and Ron s work together , as we ran the univariates and each took half of the plots to make our recommendations .
  • After running the univariates , we looked at all the plots that were created and decided how we would recommend cleaning up variables .
  • There were examples like having 7 variables for indicating if there was a pool or not , or that there is one outlier for a particular year that really skewed the data .
  • We finished all the recommendations , and created excels linking to the univariates to make it easier for everyone who would need it .
  • Task II outcomes : The recommendations have been passed off and accepted by multiple business partners .
  • I have started research on how certain parameters are chosen in a grid search and how they interact .
  • Currently , Ron and I are going through a smaller list of variables that are trustworthy , so the models can be run without interference from messy data .
  • ( b ) Created a code file with object-oriented programming ( wasn t a particular task , but learned it with Heather on the side )
kishorekumarsudha_95533_8336073_Progress Report_04032020.pdf
  • Name : Sudha Kishorekumar Internship Organization : CVS Health Mentor Name : Lisa Klein Dates of 2-Week Period Covered : 03/23/2020 – 04/03/2020
  • Review MS access database queries to understand the steps involved in generating the membership report for Annual HOQ.This is a mandatory step preceding to the HEDIS Submission .
  • Membership data is retrieved from the Aetna Datawarehouse as of December 31st of the prior year .
  • Once reviewed , membership data is then summarized based on the line of business and product .
  • MS Access Database queries were analyzed to understand the steps involved in generating the Annual HOQ membership report due to lack of documentation .
  • These entities were deemed as not eligible for 2020 HEDIS reporting by the Business Stakeholders .
  • Rollup codes depends on the line of business and product , Commercial HMO products are summarized based on the Network service area , Commercial PPO products are summarized based on the state code , Medicare HMO and PPO products by H-Contracts and D-SNPs are summarized based on the H-Contract and Plan Benefit Package Identifier .
  • Annual HOQ membership report was successfully migrated to SAS .
  • Technical documentation was developed to outline the steps involved in generating this report .
  • Due to the lack of documentation , this issue was discussed with the business stakeholders to determine the as of date .
laixinxin_121407_8327779_Report5_Lai.pdf
  • In order to improve the efficiency of the model , the analyst decided to conduct PCA even though there are not too many dimensions in the origianl dataset .
  • In such way , the executives can be convinced that even though there are a lof of variables in the data later , statistical models can still keep balanced between accuracy and efficiency .
  • After comparing the summary of PCAs , analyst decided to choose the first 2 PCs and combine witht the categorical variables to create df2 , summrary of df2 is shown as below :
  • It is still worth to conduct PCA in order to convince executives level that if we have many input features but not enough sample data , statistical model can still find a way to lift the accuracy .
  • And it is intuitive to reach such conclusion because the company is a wholesale distributor of NC , SC and VA , which mean categorical variable REGION consists locations that are all very close to each other geographically speaking .
  • are conceptually related while similar conclusion can also be applied to Frozen and Fresh variables .
  • 2 ) PCA can be incredibly helpful when there are not enough data samples for a large amount of input features .
  • In the model , analyst made the choice based on PCs ’ standard deviation which are greater than 1 .
  • Moreover , PCA is a black box and lack of interpretability , because all the PCs are statistically independent and are linear combination of all original input features .
  • 2 ) I will start to combine all the work done into PowerPoint and come up with a good story to present to the company as well as the school for the final .
paulkabita_126534_8315345_Progress Report 5.pdf
  • During last two months , we concentrated on opportunity study , data collection and requirement gathering for our app .
  • As our research is mainly focused on building Health recommendation system for patients with chronic pain and cancer , we collected relatable data sources online .
  • We performed exploratory data analysis on patients with chronic pain who have gone through breast cancer surgery .
  • Task II : Analyse collected data and build article recommender system .
  • Task III : Create database schema structures for the system and build a web project as per the dashboard prototype .
  • We collected health advice articles online to build our initial data base .
  • However , as we are designing a new system , we faced cold start problem while building our model which means we do not have enough data to find correlation between several users ’ choices .
  • Task III progress : After analysing the system and basic requirements of recommender engine we came up with schema design with two tables .
  • Task II outcomes : We did initial analysis and in process of building recommender engine .
  • ( a ) Adding webpages in the web architecture ( b ) Come up with a basic article recommender engine for our health application ( c ) correlation analysis between variables of cancer patient dataset ( d ) Find more relatable data sources and analyse .
richterlyndsay_4676_8334820_Report5_Richter.pdf
  • Name : Lyndsay Richter Internship Organization : UNC Charlotte Student Affairs Research and Assessment ( SARA ) Mentors : Dr. Erin Bentrim and Dr. Ellissa Brooks Nelson ( SARA ) Dates : March 21 April 3
  • Two weeks ago , I thought COVID-19 might be a relatively inconvenient but minor interruption to my full- time job and internship .
  • The last two weeks have been consumed by my responsibilities as the lead campus communicator in the division of Student Affairs .
  • • Working with Housing & Residence Life to execute communication activities related to March 17
  • decision to ask students to vacate the residence halls by March 20 , and associated communication for an exemption process .
  • Supervising my six direct reports and navigating their transition to remote and/or leave work environments .
  • Then came the unexpected news this week of the County and State s ask for our South Village resident halls for coronavirus support , requiring a massive communication and logistics effort currently underway to organize the retrieval of belongings and on-site move out process of approximately 2000 residential students .
  • I am going to re-assess my internship progress this weekend and reach out to Professor Hague on Monday to discuss next steps .
  • There is a third project that I would like to make progress on in April but I realize it ’ s better to have a conversation sooner rather than later in the event of any further surprises .
  • My internship supervisors are aware and supportive , and understand as they are managing their own professional and personal challenges .
sadikovibrokhim_125664_8334261_Report 5.pdf
  • From last week , me and my team made considerable movement on research and strategy development .
  • Also , now we need to prepare report for executive team what rapid analytical actions can be offered to clients in such pandemic time .
  • We set up daily agile environment quick stand up calls every day to report our progress on tasks .
  • I am working currently to propose innovative solution for optimizing debt collection .
  • I have been going through several research studies which mainly focus on machine learning enabled dialing optimization to maximize recovery of bad debts and also I am doing some hands on analysis of different state of the art methodologies to generate delinquency scorecard for clients so that the customers can be more proactive to prevent possible defaulters .
  • I have built Executive summary section already with interactive features using Shiny dashboard tool .
  • As for the task one , we are challenged to come with rapid action plan to mitigate possible upcoming downturn in our client s revenue generation .
  • Therefore , my deliverable would be concise report of possible methodologies could be implemented to optimize debt collection using different approaches .
  • It is actually cool , because I am challenging myself not only to do my assigned project but at the same time to be part of innovative hub .
  • Set up couple of meeting with mangers , peers and executives  Finalize report together with short presentation of my research on debt collection optimization Complete and review second tab section of dashboard
serapinzach_29510_8330435_Report5_Serapin.pdf
  • Zachary Serapin Wells Fargo Roy Cano March 21 April 2nd
  • I initially wrote these statements in excel as it is easier to work with at the moment , but I have since gone back and translate the code to python to help with reproducibility .
  • After flagging these I started to analyze the behavior of these models through summary tables and visualizations , but quickly realized I was suffering from information overload and tried to narrow the scope even more .
  • I added more conditional statements to track specific models based on their importance to the network and their activity levels .
  • My mentor pointed out that in order to correctly understand how new models are “ maturing in the network I need to configure the data using “ Month on Book .
  • Even though I thought I completely worked through the problem ; I could have done a better job of trying to understand what I was after from beginning so I could have saved time .
  • projects , I was able to follow a fairly straightforward process where I would scrape or take dataset and clean it in order to build a model .
  • I think it ’ s important to plan ahead just as you would when developing a model , to keep on track of the objective and not go down various “ rabbit holes ”
  • I can begin on creating a dashboard with visualizations and tables that help stakeholders draw easy assumptions about how new models act as they mature through the network .
  • Additionally , I can drill down and see how a model s influence on a the network is affected by particular attributes or how long it has been in use .
shuklabalya_1256_8335823_Progress Report 5 - Balya Shukla.pdf
  • Name : Balya Shukla Internship Organization : Genpact Mentor / Preceptor 's Name : Kaushik Chavan Dates of 2-Week Period Covered : 03/30/2020 –04/03/2020
  • This week we focused on creating a rapid action plan in light of COVID-19 specific to debt collection industry and continued doing training on Celonis .
  • Specific steps / progress
  • Task I progress : I am close to completing the training on Celonis .
  • Task II progress : I spent the week doing research on various debt collection startups that are disrupting the industry through AI and their action plan during COVID-19 .
  • Task I outcomes : The goal of this task is to use Celonis to perform machine learning .
  • Task II outcomes : The goal of this task was to eventually take the action plan to Genpact s collection department and execute the plan using their internal data .
  • ( a ) Analytics can be used to optimize various business processes even during crisis .
  • ( b ) Strategy planning can take longer than execution .
  • ( a ) Complete the training modules and take the certification exam ( b ) Complete the strategy report to present to the debt collection department
singaravelmurali_110276_8317951_Report5_Singaravel.pdf
  • Muralidharan Singaravel Student Id - 801 059 720 Spring 2020 DSBA Internship Progress Report 5
  • Current Tasks : My internship project is titled “ Customer Complaint Analysis Using Machine Learning and the main objective of the project is to analyze customer complaint data and answers questions to leadership by identifying key and emerging trends , volumes , themes and insights which will help in root cause analytics development .
  • For this two week my planned tasks are to develop and test models on various machine learning algorithms for classification - to classify consumer complaints into predefined categories and regression - to predict the reasons of customer complaints .
  • The dependent variables for both the research questions are binary , so I used classification algorithms like Logistics regression , Decision trees and Naïve Bayes .
  • As part of COVID-19 analysis project , I worked on designing the dashboard showing the growth of COVID- 19 related complaints by days and using geo maps in Tableau , showed the count of complaints by US states .
  • Outcomes/ Takeaways –Identifying the features that results in a complaint closing with monetary relief is important and the bank can learn and fix the issues that resulted in such complaints therefore reducing the monetary relief .
  • Muralidharan Singaravel Student Id - 801 059 720 Spring 2020 DSBA Internship Progress Report 5 customers .
  • I used python for the above modelling and the output was a presentation to the senior leadership with the findings .
  • Few insights from the COVID-19 Tableau dashboard are - the number of complaints peaked around mar 11 when WHO declared the COVID-19 outbreak as a global pandemic , California and Florida had the most number of complaints as panic spread among the bank customers .
  • 2 Week Plan For the next two weeks my task is to analyze unstructured customer complaints text data like summary and resolution comments and perform topic modeling , trends and key phrase extraction – it will consist of Sentiment Analysis , Text Clustering , Text Categorization , and Ontology Learning
summeykelsey_1592_8339397_Progress Report 5.pdf
  • Assisting the Large Account Management c. Conduct predictive analytics
  • d. Continue to work on visualizations in Power BI\ e. Prepare for our April Spring Meeting ( Online )
  • My weekly meetings with my Vice President are to ensure I am staying on top of my
  • This past week , my Vice President asked me to help LAM with some of their data initiatives and requests in various reports , so I have been assisting them .
  • Being that everyone is working from home and only essential employees can be on our VPN at certain times , many of my meetings got cancelled .
  • In our April Spring Meeting , I will be conducting a training for the entire team in
  • pandemic allows me to stand with our Company s tradition of coming together to help each other in times of need .
  • Coronavirus has impacted my timeline completely , and I hope I ’ ll be able to get back on track by the presentation date .
  • Train others c. Reach out to other people , even outside of your department/organization d. Run tests e. Practice makes perfect
  • Conduct predictive analytics b. Edit presentation c. Adapt my duties as needed due to COVID-19
tomasikmarie_126529_8170274_Progress_Report_5.pdf
  • Name : Internship Organization : Mentor 's Name : Dates of 2-Week Period Covered :
  • Current Tasks Task 1 was to use the already built API to get BLS data for 2019 and test in the model .
  • Task 2 was to gather data for additional hypotheses .
  • Task 3 was to clean the data for the additional hypotheses .
  • Task 4 was to help with a side project , doing cluster analysis for the engagement survey questions .
  • Used the API I built a few months ago to gather data from 2019 .
  • Using R , I did cluster analysis on the engagement survey questions to see if they should be grouped in a different way than they currently are .
  • The new data being added did not make the local union membership significant , so the model remains the same .
  • The cluster analysis came up with a slightly different grouping than is currently being used .
  • Finalize updated model Try model with only non-union plant data Present updated models to stakeholders ● Feature selection for engagement survey project
vavilalasrivan_19848_8338716_Internship - Progress Report #5.pdf
  • Name : Srivan Vavilala Company : Vishion Mentor : Gurtej Singh Work Period : Mar .
  • At the moment , my work mainly involves performing exploratory analysis on datasets from numerous affiliates .
  • The main concern now is getting the data cleaned in time to begin working on the modeling portion of the internship .
  • My approach right now is to continue learning best NLP practices in order to make
  • So far , I ’ ve manually looked through each of the datasets in order to clearly understand how best to combine them down the road .
  • Mainly , there was a lot of time that I had to put into researching and much more over
  • The tagging system idea that we ’ re aiming for is also much more flushed out .
  • Always run your ideas by your supervisors when learning about a new area as they can
  • NLTK is extremely versatile and will work well for the categorization project .
  • Understand how the data needs to be further cleaned for the categorization problem
vegesnakovidh_31534_8339351_Report5_Vegesna.pdf
  • Name : Internship Organization : Mentor / Preceptor 's Name : Dates of 2-Week Period Covered :
  • wanted to because of other things I had to take care of and the current pandemic .
  • There are some sections of the code which I am unsure of how it would apply to the problem we are trying to solve .
  • Lo and her team to better understand the problem they are facing and clarify other questions along the way .
  • The first task ( same as last week ) was testing some of the functionalities in the example code on a sample of the data provided by Dr .
  • I looked for possible packages and libraries in R commonly used for this type of problem .
  • The second task ( same as last week ) was organizing all the data files and code on Github .
  • A major part of a project is making sure all the necessary files and documents are organized properly .
  • I uploaded all the code , files , and data onto a separate branch on Github .
  • The other thing is to work on getting the setup for the R package started .
xiachunqiu_116382_8317275_Report5_Xia.pdf
  • Internship Organization : University of North Carolina , Belk College
  • The current project is to explore the customers ’ behaviors in Yelp dataset .
  • In addition , try to estimate TVEM and the coefficients of each independent variable .
  • Last time , the linear mixed model included only one random effect which means only one independent variable has random effect .
  • The coefficients of the random effect are changing varying over time .
  • However , the coefficient is pretty small and does not have a lot of impacts on the dependent variables .
  • I also draw the graphs between years and the coefficients of random effects , but the changes over time are not large .
  • I don ’ t know whether it is a good phenomenon to our data , therefore , I need to do more research about how these random effects affect the dependent variable .
  • Currently , I don ’ t think I did a good job on this model via our Yelp dataset .
  • The meaning of the outcome is not clear , I need to read more examples to understand the whole model and apply to our dataset .
xiaodiwen_117612_8337803_Report5_Xiao.pdf
  • Name : Diwen Xiao Internship Organization : UNC Charlotte Mentor / Preceptor s Name : Professor Ming Chen Dates of 2-Week Period Covered in the Progress Report : 03/23/2020-04/03/2020
  • Task I : Conduct model estimation process by using temporal dynamic model with both software package R. Task II : Draw different types of figures for detected area of interests of the tested volunteers .
  • Specific Steps/Progress Task I : For the model estimation process using temporal dynamic model , first I have researched some related researches using temporal network analysis .
  • Then , I have learned temporal social network analysis tutorial and download related packages .
  • Task II : for the matching and synchronize eye-fixation process , firstly , I have draw different types of figures in the professional software for different Area of Interests that people may interest with , such as brands , price , and text .
  • Outcomes Task I outcomes : I have obtained an image from R for the temporal social network analysis , which is a static visualization of the temporal social network analysis to show every workshop and collaboration from the previously field experiment .
  • Task II outcomes : I have drawn different types of figures in the professional software for different AOIs .
  • Takeaways/Lessons-Learned 1 ) Learned and understand how temporal dynamic model process , and how to use related packages in analysis software R. 2 ) Learned the main goals of temporal dynamic analysis .
  • 3 ) Conduct the robustness check based on the model estimation results , which is to check the core and the most important regression coefficient estimates behaves , so that can check the model validation to make sure both the data and the result are reliable .
  • 4 ) Try to use other classifiers such as ANN and Random Forest for model evaluation process if the time is enough .